Document Indexing With a Concept Hierarchy
نویسندگان
چکیده
We discuss the task of selection of the concepts that describe the contents of a given document. We propose to use a large hierarchical concept dictionary (thesaurus) for this task. A statistical method of document indexing driven by such a dictionary is proposed. The problem of handling non-terminal nodes in the hierarchy is discussed. Common sense-complaint methods of automatically assigning the weights to the nodes and links in the hierarchy are presented. The application of the method in a system Classifier is discussed.
منابع مشابه
Conceptual document indexing using a large scale semantic dictionary providing a concept hierarchy
Automatic indexing is one of the important technologies used for Textual Data Analysis applications. Standard document indexing techniques usually identify the most relevant keywords in the documents. This paper presents an alternative approach that aims at performing document indexing by associating concepts with the document to index instead of extracting keywords out of it. The concepts are ...
متن کاملDocument Indexing with a Concept Hierarchy Índice de Documentos con una Jerarquía de Conceptos
Given a large hierarchical concept dictionary (thesaurus, or ontology), the task of selection of the concepts that describe the contents of a given document is considered. A statistical method of document indexing driven by such a dictionary is proposed. The method is insensible to inaccuracies in the dictionary, which allow for semi-automatic translation of the hierarchy into different languag...
متن کاملIndexing with a Concept Hierarchy
Given a large hierarchical concept dictionary (thesaurus, or ontology), the task of selection of the concepts that describe the contents of a given document is considered. A statistical method of document indexing driven by such a dictionary is proposed. The method is insensible to inaccuracies in the dictionary, which allow for semiautomatic translation of the hierarchy into different language...
متن کاملDocument Retrieval through Concept Hierarchy Formulation
The enormous growth of the Internet and the widespread use of computer systems in general created very large collections of electronic documents, and methods existing so far have proved unable to handle the massive amount of unstructured documents. In this article we discuss a variant of document retrieval, where traditional indexing is augmented by concept hierarchy (composed by observing conc...
متن کاملIDSIS: Intelligent Document Semantic Indexing System
System Zhongzhi Shi Bin Wu Qing He Xiujun Gong Shaohui Liu Yi Zheng [email protected] Key Laboratory of Intelligent Information Processing , Institute of Computing Technology ,Chinese Academy of Sciences Abstract: With rapid growth of the Internet, how to get information from this huge information space becomes an even important problem. In this paper, An Intelligence Document Semantic Indexi...
متن کامل